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Abstract 

This work aims at recovering signals that are sparse on graphs. Compressed sensing offers tech¬ 
niques for signal recovery from a few linear measurements and graph Fourier analysis provides a 
signal representation on graph. In this paper, we leverage these two frameworks to introduce a 
new Lasso recovery algorithm on graphs. More precisely, we present a non-eonvex, non-smooth 
algorithm that outperforms the standard convex Lasso technique. We carry out numerical exper¬ 
iments on three benchmark graph datasets. 


1 Sparse Representation on Graphs 

The goal of this work is to reconstruct signals on graphs that are supposed to be sparse in the graph 
Fourier representation. In this context, we will deal here with two main concepts, graph and sparsity, 
which have gathered a lot of attention in the recent years with the emergence of Compressed Sensing 
and Big Data. Let us introduce briefly these two concepts in the rest of this section. 

Graph/network is a powerful tool to represent complex high-dimensional datasets, in the sense 
that a graph structures data with respect to their similarities. Graphs have become increasingly 
more considered in applications such as search engines, social networks, airline routes, 3D geometric 
shapes, human brain connectivity, etc. Mathematics offer strong theoretical tools to analyze graphs 
with Harmonic Analysis and Spectral Graph Theory. An essential graph analysis tool is the graph 
Laplacian operator, which is the discrete approximation of the continuum Laplace-Beltrami operator 
for smooth manifolds. It is known that the eigenvectors of the Laplace-Beltrami operator provide a 
local parametrization of the manifold [T] . Equivalently, the eigenvectors of the graph Laplacian, also 
called graph Fourier modes, provides a representation of the graph. Given a graph with (V, E, W), V, 
E and W being respectively the set of n nodes, the set of edges and the similarity/adjacency matrix, 
then the (unnormalized) graph Laplacian operator is defined as 

L = D-W, 

where D is the diagonal degree matrix s.t. Du = Wij. L is symmetric and positive-semidefinite, i.e. 
its eigenvalues A^, Vi are nonnegative. The graph Fourier modes are given by the eigenvectors {ui}^^^ of 
L and can be represented by the orthogonal matrix U = {ui ,..., u^) G s.t. U'^U = /. The graph 

Fourier basis U acts as a basis to represent, analyze and process signals on graph. For example, one can 
represent a function f :V on graph as /(i) = fi • Ui{i) where fi = (/, Ui) = f{i) • Ui{i) 

is its Fourier transform. In this paper, we consider three well-known graphs. First, the synthetic LFR 
graph, which was introduced in [2] to study community graphs. Here, the number of nodes is chosen 
to be n = 1, 000, the number of communities is 10 and the degree of community overlapping is /i = 0.4. 
Second, we consider a coarse version (for computational speedup) of the benchmark MNIST dataset 
of NYU [3] with n = 1,176 nodes and the number of classes is 10. Last, we use a coarse version of 
the well-known 20newsgroups dataset of GMU [4] with n = 1,432 nodes and the number of classes is 
20. All three dataset graphs are illustrated on Figured] with their graph Laplacian spectrum. 

^Institute of Electrical Engineering, (xavier.bresson@epfl.ch) 

^Department of Mathematics, Loyola Marymount University (tlaurent@lmu.edu) 

^Department of Mathematics, University of California Los Angeles (jub@math.ucla.edu) 


1 



(a) LFR 


(b) MNIST 


(c) 20NEWS 





Figure 1: Graph and spectrum of LFR, MNIST, 20NEWS. 


Sparse recovery is currently one of the most studied topics in signal processing. The main goal is 
to reconstruct signals that are supposed to be sparse in some basis representation. For example, in 
medical imaging, one of the objectives of sparsity is to speed up MRI acquisition by reconstructing an 
image in the Fourier basis given a small number of Fourier samples. This problem can be generalized to 
find the solution of a underestimated linear system of equations, which is generally ill-posed, with the 
constraint that the solution is sparse. Finding the solution of this problem is however impracticable 
because it is a NP-hard combinatorial problem. But Candes, Romberg, Tao and Donoho showed 
in 0 0 that using an relaxation and under some conditions on the linear operator, known as 
the Restricted Isometry Property (RIP), and the measurements, known as incoherence property, 
there exists a tight convex relaxation of the NP-hard problem, that is easily tractable. However, it 
has recently been observed that the relaxation technique can be improved with reweighed [7], 
ip^p < 1 [8], difference of convex functions ^ 1-^2 [2] and smoothed ^ 1/^2 ratio [10]. These recent works 
suggest that non-convex relaxations may outperform the original £i sparse recovery. In this work, we 
follow this line of research and we introduce a new non-convex algorithm for sparse recovery on graph. 
Specifically, our goal is to improve Lasso problems on graph. 

2 Enhanced Sparsity 

Starting from the standard £i problem for sparse recovery 

min||a;||i s.t. Ux = fo, 

X 

where x is a sparse signal to be recovered, U is the graph Fourier basis, and /o are the given measure¬ 
ments, we propose the following enhanced recovery model 

min||a;||i s.t. Ux = fo, ||x ||2 = 1. 

X 
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The new additional constraint, i.e. the ^2 unit sphere^ is a non-convex set that is here essential for 
enhancing sparse recovery. Basically, it forces the solution to be at the intersection of the ^i-ball and 
the ^ 2 -sphere, which are precisely the locations of sparse points in the Euclidean domain, see Figure 
[21 Observe now that the new constrained £i optimization problem is equivalent to 

min|^ s.t. Ux = fo (1) 

® IfI|2 

The equivalence comes from the fact that the ratio ^ 1/^2 is a zero-homogenous function, i.e. F{ax) = 
F{x)^a > 0. This means that the solution x'^ is the same as ax^^ \/a. Particularly, for the specific 
value of a such that x'^ belongs to the unit sphere ||x '*'||2 = 1. Figure [2] compares geometrically the 
standard £i and the new ratio model At a first glance, both models promote sparsity and 

the new model does not appear to bring anything new but a more complex problem. However, this 
figure acts as a simple illustration and one must remember that the recovery performance depends 
also on the incoherence property about the number of observed measurements. In this context, the 
major motivation to go beyond convexity with the recent works 1318119] is to precisely improve sparse 
recovery with a smaller number of measurements than the standard approach. We will see that the 
newly proposed model holds this property. 




Figure 2: Standard ii and 


3 Optimization 


We consider a different version of ([T]) that is robust to noise: 


min 

X 



+ \\\Ux-f4l 


( 2 ) 


Problem m is a non-smooth and non-convex optimization problem. The /non-smooth part of the 
problem can be handled quite efficiently with techniques introduced in Compressed Sensing such as 
Alternating Direction Method of Multipliers (ADMM) [11] or Uzawa-type Primal-Dual technique [12]. 
However, the non-convex part is more challenging. For general non-convex problems, it is difficult to 
design an algorithm that is fast, accurate, robust and also guaranteed to converge, or at least that 
satisfies the monotonicity property. Monotonicity means that the energy is guaranteed to decrease 
at each iteration, although the problem is non-convex. In this situation, most non-convex algorithms 
only find solutions that are critical points or local minimizers, and rarely global minimizers. 


3.1 Proximal Forward-Backard Splitting Algorithm 

We develop in this section an algorithm for the ratio optimization problem ([2]). A related numerical 
scheme was introduced in [13] in the different context of data clustering. Let T{x) = ||x||i, B{x) = 
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||a;|| 2 , E{x) = T{x)/B{x) and F{x) = l-Hf/a; — /olH such that we want to solve 


T{x) , 

Let us consider a semi-implicit gradient flow for this problem: 

dT{x'^+^) ■ Bix'^) - Tix'^) ■ dB{x^) fe+i. 

where d stands for the subdifferentials of T and B (which is not unique for ii but is for £ 2 ) and is 
the time step. This provides the optimality condition 




T^k k 

xk+i _ ^ 9 0, 


(3) 


where the notations = T{x^) and = B{x^) are used. This leads to a two-step iterative scheme: 


( 1 ) =x'^ + cldB{x’^) 

and 

(2) x^+^ = argminc^T(a;)-h ^F(a;)-h ha;-y||^ 

X II 

where Cq = /B^ and c\ = /B^. The second step is the proximal operator daE! of the convex 

k 

function c^T + Overall, we have designed a proximal forward-backward splitting algorithm to 

solve m as the solution is given by 

= prox^,^^^^(x'= + 4dB{x’^)). (4) 

In the next section, we will show that the proposed iterative algorithm is (almost) monotonic, i.e. its 
energy is guaranteed to decrease at each iteration. 

3.2 Monotonicity 

We show the following quasi-mo notonicity result: 

d/c+1 W^k _ _/c+l ||2 

—_£;'=+!) + (i?fe _ i?'=+l) > U-^-111 (5) 

Proof. Define the convex functions 

g'^ix) = 4 B{x) + t’^F{x), (6) 

F'^ix) = ctT(a;)-h r''F^ (7) 

and observe that g^{x^) = F^{x^) for latter use. We remind the general definition of the subdiffer- 
ential d£ of a convex function £\ 

£{xi)>S{x2) + {xi-X2,y2), yy2 & d£{x2). (8) 

We plug xi = , X 2 = x^ and £1 = ^ in (jHJl: 

g’^{x’^+^) > g^ix'^) + - a;^ dg^{x^)) (9) 
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If we now observe that the first step of the algorithm is ^ with = CQdB{x^) = dQ^{x^) 

then m becomes 

^fe(^fe+i) > (a;fe+i -x^,v’^). (10) 

Let us now plug xi = , X 2 = x^^^ and ^ in ([H|): 

F'‘{x'^) > J’'=(x'=+i) + (x*’ - x'=+\5J’'=(x''+i)). (11) 

Notice that the optimality condition (|31) reads 9 0 and thus e 

dJ-^{x^^^). This implies that (lllll may be written as 

F’^ix'^) > ^'=(x'=+i) + (x'=-x'=+S/-x''+i) 

> J^'^{x'^+^) + \\x'^-x^+^\\l + {x'^-x'^+^,v^) ( 12 ) 

Adding (flOl) and (IT2ll and using the fact that Q^{x^) = iF^{x^) we have 

gfc(2.fc+i) > j-fe(2.fc+i) ii^-fe _ a;*+i||2 (13) 

Using the definition © and ©, this inequality can be rewritten as which is the desired result. □ 


Notes. Observe that close to the steady-state solution, we have ^ 1 for /c ^ oc and the 

quasi-monotonicity tends to a monotonicity property. Second, see that if we had access to the quantity 
^/e+i ^ good estimation) then we would set tq and this would imply 


7Ti/c 7Ti/e + l ^ 

^Tot ~ ^ 


Tot 


where Etoi = E F, and thus unconditional monotonicity for any tq. 


4 Applications 


4.1 Enhanced Lasso on Graphs 

The Algorithm. The standard Lasso problem on graph is min^, ||x||i + f||^^~/o ||2 where U is the 
sensing matrix, here the graph Fourier modes. Function /o is the signal measured on the graph. It is 
generated as /o = (xq + n) where xq is a pure sparse signal with 5% of non-zero entries uniformly 
chosen between [—1,1] and n is the noise, a Gaussian distribution with standard deviation a = 0.1. 
The goal is to recover the sparse signal xq. We recall that the proposed enhanced Lasso problem 
on graph is min^, -pjl^ + “ /olll- We use the proximal forward-backward splitting algorithm 

introduced in Section I tT] to solve it. That is. Step 1: = x^ + = x^ and 

Step 2: = argmin^, F{x) -h G{x) where F{x) = ||x||i and G{x) = ^^\\Ux — /oUl + 

We may write this problem as a saddle-point problem min^, max^ (p, x) — F*(p) + G{x) where F^ is 
the barrier function of the ioo unit ball such that 


F^ 



0 

-hoo 


if IpI < 1, 

otherwise. 


Note that G{x) is uniformly convex so that we can apply the accelerated primal-dual algorithm of 
p4] . The algorithm consists in iterating the following steps: 


pn+l 

= prox,„^*(p" + (T"x") 

(14) 

^n+1 

= prox^n c (x" - 7?”p”+^) 

(15) 

Qn+1 

= l/yi + 277?”,r"+i =6»"+q",c7"+i =c7"/6>"+i 

(16) 

x^+^ 

= x”+i+r+i(x”+i-x”) 

(17) 
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The scheme converges quickly, with order 0(l/n^), provided that = r]^ = 1. The first inner 
proximal problem has an analytical solution 

prox^n^^( 2 ;) = zj max{l, \z\}, 


and the second inner proximal problem has also a closed-form solution 


prox^„G(^^) = 


1 + E^Xrf^ + E^rfjT^ 


As the two proximal operators are fast to solve, so it is for the general algorithm. In fact, solving the 
non-convex ratio problem m for sparse recovery can he seen as solving the standard Lasso problem 
with the addition of a convex quadratic term \\x — y ^\\2 ci'nd updating y^ each time the monotonicity 
condition is satisfied. We summarize the algorithm here. 


Algorithm. Initialize ^ = rf' ^ = 1,7 = 1, and iterate k until convergence 

( 1 ) 

(2) y^=x^ + E’^^^ 

(3) Inner loop: iterate n until the monotonicity condition, /B^{E^ — E'^) + — E'^) > \\x^ — 

x'^\W/t^^ is satisfied: 

(3i) p'^^^ = {p'^ + (j^x'^)l max{l, \p'^ + a^x'^W 

(oyA ^n+l _ /r^ 

1+E^Xp^+E^p^/r^ 

(3iii) (9^+^ = l/Vl + 277^,r^+i = = cr^/6>^+i 


(3iv) 


^n+1 _ j.n+1 


.6|n+l( 


j.n+1 _ ^n\ 


(4) X^ = 

Note: the time step = B^ was chosen experimentally, and is the subject of future study. 


Numerical Experiments. We compare standard Lasso and enhanced Lasso on graphs. We test on 
the LFR, MNIST and 20NEWS graphs. The value of the parameter A that balances the sparsity term 
and the fidelity term is chosen to minimize the recovery error defined as ||:r — xo|| 2/||^||2 for all models 
and all graphs. The results are reported on Table [T] and Figure [3l Overall, the proposed enhanced 
Lasso model performs better than the standard one, but it is 2-3 times slower. 



Standard Lasso 

Proposed Lasso 

LFR 

0.419 

0.309 

MNIST 

0.417 

0.302 

20NEWS 

0.481 

0.325 


Table 1: Accuracy for standard Lasso vs proposed Lasso on three graphs. 


4.2 Enhanced Lasso-Inpaiting on Graphs 


The Algorithm. In this section, we add a layer of difficulty by removing a set of observed measure¬ 
ments in /q. In other words, we do not observe the whole function /o but only a portion of it. This 
problem is equivalent to a Lasso-Inpainting problem. For this, a diagonal selector matrix R is added 
to the linear operator U such that 


D _ / ^ if ^ G LIqJjs’) 

I 0 otherwise, 

Llobs being the set of observed measurements, and Ra = 0 otherwise. The formulation is thus 
min^, ||x||i -f- ^\\RUx — /oUl- The enhanced Lasso-Inpainting is naturally 


min 

X 



+ ^\\RUx-M\i 
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Figure 3: Standard Lasso vs Proposed Lasso on three graphs. 


We apply the same technique as in Section 03] to compute a solution to the problem. The only change 
is the solution of the inner proximal problem prox^„Q (^) = U^{UblK) where b = z + E^Xrj'^RU* fo ^ 
EkfjUykK = I E^Xy^R H- E^rj^/rk^ which is also fast to compute. 

Numerical Experiments. We compare standard Lasso-Inpainting and enhanced Lasso-Inpainting 
on graphs. We test on the LFR, MNIST and 20NEWS graphs. We remove 40% of measurements 
of /o with R. The value of the parameter A is again chosen to minimize the recovery error defined 
as ||x — xo 112 / 11^112 for all models and all graphs. The results are reported on Tableland Figure IH 
Overall, the proposed enhanced Lasso-Inpainting model also performs better than the standard one, 
but it is 2-3 times slower. 



Standard Lasso-Inp 

Proposed Lasso-Inp 

LFR 

0.667 

0.540 

MNIST 

0.509 

0.362 

20NEWS 

0.516 

0.468 


Table 2: Accuracy for standard Lasso-Inpainting vs proposed Lasso-Inpainting on three graphs. 


5 Conclusion 

A new sparse recovery algorithm for Lasso-type problems on graph has been introduced. Numerical 
experiments have shown improvements over the standard £i algorithms. This result leverages the 
recent idea to go beyond ii convexity and explore non-convex, non-smooth techniques to find bet¬ 
ter sparse solutions. In this context, the closest works to ours are (i) the difference of convex (DC) 
functions [9] and (ii) the smoothed ^ 1/^2 technique [TO]. We would like to explore in a future work 
the relationship between our model and these models. Particularly, a direct application of Dinkelbach 
technique m reveals that minimizing the ratio is equivalent to minimize the DC model ii — ai 2 with 
a being the minimum value of the ratio ^ 1 /^ 2 - As a result, an interesting question is whether this a 
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Figure 4: Standard Lasso-Inpainting vs Proposed Lasso-Inpainting on three graphs. 


value, which is automatically learned with the proposed algorithm, can provide satisfying solutions 
for a range of sparse problems. Eventually, we would like to compare our exact ^ 1/^2 ratio technique, 
which has a weak monotonicity property, with the smoothed ratio technique of m, which has a strong 
monotonicity feature. 
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